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Abstract 

Design of a fuzzy rule based classifier is proposed. The 
performance of the classifier for multispectral satel- 
lite image classification is improved using Dempster- 
Shafer theory of evidence that exploits information of 
the neighboring pixels. The classifiers are tested rigor- 
ously with two known images and their performance are 
found to be better than the results available in the lit- 
erature. We also demonstrate the improvement of per- 
formance while using D-S theory along with fuzzy rule 
based classifiers over the basic fuzzy rule based classi- 
fiers for all the test cases. 

1. Introduction 

Analysis of satellite images has many important appli- 
cations such as prediction of storm and rainfall, esti- 
mation of natural resources, estimation of crop yields, 
assessment of damage caused by natural disasters, and 
land cover classification. In this paper we focus on land 
cover classification from multi-spectral satellite images. 

The most widely used techniques for this prob- 
lem employ discriminant analysis, maximum likelihood 
classification, and neural networks [6] , [3] . Such classi- 
fiers cannot handle the fact that for land cover a pixel 
may correspond to more than one types of objects. For 
example, the area covered by a pixel may correspond to 
30% land and 70% water. Note that, the uncertainty 
involved in classifying such a pixel is not probabilis- 
tic, but fuzzy in nature and thereby it demands "soft" 
classifiers. In developing soft classifiers for land cover 
analysis two approaches have gained popularity. These 
are based on (l)fuzzy set theory and (2) Dempster and 
Shafer's (DS) evidence theory [7]. 

Numerous fuzzy classification techniques have been 
developed by many researchers to solve problems in di- 
verse fields. A comprehensive account of such works 
can be found in |2] . Fuzzy rules are attractive because 
they are interpretable and provides an analyst a deeper 
insight into the problem. Use of fuzzy rule based sys- 
tems for land cover analysis is relatively new. In a re- 
cent paper Bardossy and Sanianiego il^ have proposed 



a scheme for developing a fuzzy rulebased classifier for 
analysis of multispectral images. 

The other approach for designing soft classifiers is 
to use the evidence theory developed by Dempster and 
Shafer [71 . Since the theory of evidence allows one to 
combine evidences obtained from diverse sources of in- 
formation in support a hypothesis, it seems a natural 
candidate for analyzing multispectral images for land 
cover classification. 

Here we propose a scheme for designing fuzzy rule- 
based classifiers for land cover types that uses evidence 
theory for decision making. This is a two stage process. 
First we find a good set of fuzzy rules using informa- 
tion from all channels. In the next stage, the responses 
of the fuzzy rules over a 3 x 3 neighborhood are used to 
define 8 Basic Probability Assignment which are then 
combined by DS rule to exploit contextual information 
to make a better decision. The problem of high varia- 
tion in the variances of different features, which often 
degrades the performance of a distance based classi- 
fier substantially, is handled in a natural manner by 
fuzzy rules due to the atomic nature of the antecedent 
clauses. 

2. Designing the Fuzzy Rule base 

The proposed scheme has several stages. First a set of 
labeled prototypes is generated. Then the prototypes 
are converted into fuzzy rules. The fuzzy rules are fur- 
ther tuned for improving their performance. Labeled 
prototypes can be generated using any clustering al- 
gorithm followed by labeling the cluster centers. How- 
ever, for most of such algorithms the number of clusters 
is a predefined parameter. Here we use the prototype 
generation scheme described in [5] . It is a two stage al- 
gorithm involving unsupervised and supervised learn- 
ing that dynamically decides the number of prototypes 
and extract them using the training data. For details 
the readers are referred to [5]. 



2.1. Designing the fuzzy rulebase 

A prototype (representing a cluster of points) 



for 



class k can be translated into a fuzzy rule of the form : 



Xi 



is CLOSE TO 



Vil 



AND • • • AND 



CLOSE TO v,p then class is k. 



The fuzzy set CLOSE TO Vij is modeled by a Gaussian 
membership function : 

fiij {xj ; Vij , cr,j ) cxp - {xj - Vij f/(Jtj'^. 

Given a data point x with unknown class, we first find 
the firing strength of each rule. Let ai{'x) denote the 
firing strength of the i*^ rule on a data point x. We 
assign the point x to class k, if = maxi(ai(x)) and 
the r*^ rule represents class k. 

Each fuzzy set is characterized by two parameters Vj 
and aij . The VijS of the rules can be initialized with the 
components of the final set of prototypes, F-^™"', gen- 
erated by our SOFM based algorithm, V° = F/^""' - 



final 



, vi*"° } = {vJ, • • • , V?} where 



final 



The notation is used to indicate that it corresponds 
to the initial centers of the membership functions. The 
initial estimates of the aijS are computed as follows. 

For each prototype v° in the set V° — {v^ | i — 
l,...,c, v° 6 W} let Xi be the set of training data 
closest to V?. For each v? the set 



Si — k^\^(Jij I (Ji 



vm)/\X.\} 



is computed and is associated with the prototype. We 
use the aij as the spread of the membership function 
whose center is at Vij; fc^, > is a constant parame- 
ter and its value can have a significant impact on the 
classification performance for complex data sets. 



2.2. Tuning the rulebase 

The initial rulebase i?*^ thus obtained is further refined 
to achieve better performance. The exact tuning al- 
gorithm depends on the conjunction operator used for 
computation of the firing strengths. The firing strength 
can be calculated using any T-norm [2] . Use of different 
T-norms results in different classifiers. The minimum 
and the product are among the most popular T-norms 
used as conjunction operators. It is much easier to for- 
mulate a calculus based tuning algorithm if product is 
used. However, if there are many clauses in the an- 
tecedent, the firing strength of a rule tends to have 
low numerical values even when the membership value 
of each individual clause is quite high. Though com- 
putationally this does not pose any problem (we are 
interested in relative firing strengths of the rules), it is 
conceptually somewhat unattractive - especially from 
the interpretability viewpoint. 

Thus to avoid the use of the product and at the same 
time to be able to derive update rules easily we use a 
soft-min operator. 



The soft-match of n positive number xi,X2, ■■■,Xn 
is defined by 



SM{xi,X2, ...,Xn,q) = 



{xl+xl + ... + xl) 



1/9 



where q is any real number. SM is known as 
an aggregation operator with upper bound of value 
1 when Xi € [0, l]Vi. It is easy to see that 
liuiq^ao SM{xi,X2, ■■■,Xn,q) = iaiax{xi , X2 , ■ ■ ■ , Xn) and 
liuiq^-ao SM{xi,X2, ■■■,Xn,q) = min(xi, 0:2, a::n). 
Thus we define the softmin operator as the soft match 
operator with a sufficiently negative value of the pa- 
rameter q. The firing strength of the r-th rule com- 
puted using softmin is 



ar(x) 



l/<7 



P 



In the present study we use q = —10.0. 

Let X e X be from class c and Rc be the rule from 
class c giving the maximum firing strength ac for x. 
Also let R-,c be the rule from the incorrect classes hav- 
ing the highest firing strength a^c for x. 

We use the error function E — J2xexi^~'^c + ct^c)'^ ■ 
We minimize E with respect to Vcj, v^cj and acj, 
a^cj of the two rules Rc and R^c using gradient de- 
cent. Here the index j corresponds to clause number 
in the corresponding rule. Minimizing E will refine 
the rules with respect to their contexts in the feature 
space. Note that, the context referred here is different 
from the context of a pixel defined in terms of its spa- 
tial neighborhood. The tuning process is repeated until 
the rate of decrement in E becomes negligible resulting 
in final rule base R final ■ 

3. Using the theory of evidence 
for Rule aggregation 

For the sake of completeness, we briefly intro- 
duce the Dempster-Shafer theory of evidence. Let 
O be the universal set and P{Q) be its power 
set. A Belief measure is a function Bel : 
P(0) — ^ [0,1] that satisfies the axioms [7]. 

61 : Bel{%) = and Bel{Q) = 1. 

62 : For every A,B ^ P(e), li A d B then Bel{A) < 
Bel{B). 

63 : Bel{Ai U ^2 U • ■ ■ U An) > J2tBel{A,) ~ 
J2,<j Bel{A, n Aj) + • • ■ + {-lYBel[Ai n • • • n A„), 
for every n and for every collection of subsets of O. 

There is a plausibility measure with each belief mea- 
sure defined by Pl{A) = 1 - Bel{A'^yiA e P(e). 



Every belief measure and its dual plausibility mea- 
sure can be expressed in terms of a Basic Probability 
Assignment (BPA) function to. to : P{Q) — > [0, 1] is 
called a BPA iff m(0) = and Eaco ™(^) = 1- A 
belief measure and a plausibility measure are uniquely 
determined by m through the formulas: 



BeliA) = m{B). 



BCA 



Pl{A) ^ ™(^) c 



(1) 



(2) 



Every set A e ^'(6) for which rn{A) > is called 
a focal element of to. Evidence obtained in the same 
context from two distinct sources and expressed by two 
BPAs TO^ and on some power set P(0) can be com- 
bined by Dempster's rule of combination to obtain a 
joint BPA TO^^^ as: 



m^(S)m^(C) 



1-K 



if 

if A== 



(3) 



Here 



K = 



E 

3nc=0 



to1(B)to2(C). 



Eq. (3) is often expressed with the notation to^'^ — 
m} © m? . The rule is commutative and associative. 
Evidence from any number (say k) of distinct sources 
can be combined by repetitive application of the rule 



as TO = m} © m? 



© • ■ • © to" 



7-^ m' 



3.1. Pignistic probability 

Given a belief measure we are often required to make 
decisions based on the available evidence. In such case 
Q becomes the set of decision alternatives and the func- 
tion Bel denote our belief about the choice of the op- 
timal decision Oq Q. However, in general it is not 
possible to select the optimal decision directly from the 
evidence embodied in the function Bel. In such cases, 
we use the pignistic transformation, Tq , to construct a 
probability function for selecting the optimal decision 
[8j. Thus 

=re(BeO. 

is called a pignistic probability, which can be used 
for making decision . The pignistic probability for 9 € 
Q can be expressed in terms of BPAs as follows: 



P®(6l) 



E 

Ace,eeA 



m{A) 

JaT 



(4) 



Optimal decision can now be chosen in favor of 9o, 
if 9o has the highest pignistic probability. 



3.2. Scheme for decision making 

In our problem the frame of discernment is the set of 
classes, C={Ci, C2, • • • Cc}, where c is the number of 
classes. The propositions take the form the true class 
label of the pixel of interest is in A C C. 

Let us denote the pixel of interest as p° and its eight 
spatial neighbors as p^,p^,---p^. We use the firing 
strengths produced by the rulebase in support of dif- 
ferent classes for p*^ and one of its neighbors, say p^ as 
the i-th source of evidence. Let r be the number of 
rules in the fuzzy rulebase. Since c < r, there could 
be multiple rules corresponding to a class. Let a° be 
the highest firing strength produced by the rules cor- 
responding to the class Ck for We treat this value 
as the confidence measure of the rulebase pertaining to 
the membership of p^ to the class Ck- Thus, the set of 
values CM° = {a^. : k = 1, 2, • • • c} contain the confi- 
dence measures for all the classes for p° (if a confidence 
measure is less than a threshold, say 0.01, it is set to 
0). A similar set of confidence measures CM* can be 
constructed for every p'; i = 1, • • ■ , 8. 

Now we use CM° and CAP to define the i-th BPA 
to' to the subsets of C. There are 2'^ possible subsets 
of C, i.e., members of the power set of C. Each subset 
corresponds to the proposition that the "true" class of 
p^ is contained in that subset. We shall consider the 
subsets containing one and two elements only. The sub- 
sets containing one element correspond to propositions 
of the form "the class contained in the subset is the true 
class for and the subsets containing two elements 
corresponds to propositions of the form "the true class 
label of p° is any one of the two classes contained in 
the subset" . Assigning BPA to a subset essentially in- 
volves committing some portion of belief in favor of the 
proposition represented by the subset. So the scheme 
followed for assigning BPAs must refiect some realistic 
assessment of the information available in favor of the 
proposition. We define to' as follows: 



m\{Ck}) = 2 1 ,fc = l,2,...,c (5) 

For l,m = I, 2, c, to'({C;, Cm : I < to}) = 



(6) 



2S 

where S = J2lZl ^"'"T"^ exp-("*^-"°^)' + 

sr^l=c~l sr^m=c (al+aZ,) (g^-g^l^ . 

1^1 = 1 l^m=l+l 2 ^^t* ^ 

(<±2?lexp-«-"?)\ 

The numerators in the right hand side of the above 
formulae are measures of confidence in favor of the re- 
spective propositions. A closer look on (5) shows that 



the numerator is a product of two terms. The first 
term is the average of the confidence measures of p° 
and for the class Cfc, while the second term is an 
exponential one that reflects the degree of closeness of 
the confidence measures. Thus as a whole a high value 
of the numerator reflects two facts: (1) both and 

has high confidence value for class Ck and (2) the 
confidence values are close to each other. Eq. (6) is 
a straightforward extension of the same concept when 
we define the confidence in favor of a pair of classes. 

Thus for the eight neighboring pixels we obtain eight 
combinable sources of evidence. The global BPA can be 
computed by applying the Dempster's rule repeatedly. 
The combined global BPA mP is computed as follows: 



Li*^* = (• • • ((m^ ©m^) ©m^) ( 



(7) 



It is easily seen that: 

m^'^^HiCk}) = m\{Ck})®mH{Ck]) 
m\{Ck})mi{{Ck}) 

+m'{{Ck])Y.i^u^\{Ck,Ci}) 
+ E^fc ^'}) T.^^k,l ^'{{Ck, Cr^}) 

k = l,2,...,c 

and 

m^''^\{Ci,an}) = m^({G,C„J)©m^({G,a4) 
m'{{Ci,Cm})m^i{Ci,Cm}) 
l-K 

l,m = 1, 2, c,l ^ m; where K is given by 
K 



(8) 



EC— 1 
TO 



+ ELi m^iiCk}) Elm^k TO'({Q, C„J) 

+ T,Ur,.. and n.^r,s TO^jQ , })m^ ({C. , Cs}). 

Once TO*^ is obtained the pignistic probability for 
each class is computed. The following formula is used 
for computing the pignistic probability of class Ck '■ 



P'^(Cfc) = m«({Cfe}) + 



(9) 



The pixel p° is assigned to the class Ck such that 
P^iCk)>PHCi)yCieC. 

4. Experimental results and dis- 
cussions 

We report the performances of the proposed classifiers 
for two multispectral satellite images. We call them 
Satimagel and Satimage2. 



Ting 
Set 


No. of 
rules 




Error Rate in 
Training Data 


Error Rate in 
Whole Image 


Satimagel 


1. 


30 


5.0 


12.0% 


13.6% 


2. 


25 


6.0 


14.3% 


14.47% 


3. 


25 


5.0 


12.0% 


13.03% 


4. 


27 


4.0 


12.6% 


12.5% 


Satiinagc2 


1. 


14 


2.0 


16.3% 


14.14% 


2. 


14 


2.0 


16.3% 


14.04% 


3. 


12 


2.0 


17.09% 


14.01% 


4. 


11 


2.0 


17.34% 


14.23% 



Table 1: Performances of fuzzy rulebased classifiers 
using firing strength for decision making for different 
training sets 

The Satimagel is a 256-level Landsat-TM image of 
size 512 X 512 pixels captured by seven sensors operat- 
ing in different spectral bands. Each sensor generates 
an image with pixel values varying from to 255. The 
512 X 512 ground truth data provide the actual dis- 
tribution of classes of objects captured in the image. 
From this data we produce the labeled data set with 
each pixel represented by a 7-dimensional feature vec- 
tor and a class label. Satimage2 also is a seven channel 
256-level Landsat-TM image of size 512 x 512. How- 
ever due to some characteristic of the hardware used in 
capturing the images the first row and the last column 
of the images contain gray value 0. So we did not in- 
clude those pixels in our study and effectively worked 
with 511 X 511 images. The ground truth containing 
four classes is used for labeling the data. 

In our study we generated 4 training sets of samples 
for each of the images. For Satimagel, each training set 
contains 200 data points randomly chosen from each 
of eight classes. This choice is made to conform to 
the protocol followed in [4]. For Satimage2 we include 
in each training set 800 randomly chosen data points 
from each of four classes. Bischof et al. [3] used more 
training points / class than that of ours. 

First we report the performances of the fuzzy rule- 
based classifiers using firing strengths directly for de- 
cision making and compare the results with the pub- 
lished results. Then we report the performances of the 
fuzzy classifiers using evidence theoretic approach for 
decision making. The performances of fuzzy rulebased 
classifiers using firing strengths directly for decision 
making is summarized in the Table 1. 

For Satimagel the best result reported in j4] uses a 
fuzzy integral based method and gives the classification 
rate 78.15%. In our case, even the worst result is about 
5% better than that. 

For Satimage2 the reported result in [3] shows 
84.7% accuracy with the maximum likelihood classifier 
(MLC) and 85.9% accuracy with neural network based 
classifier. In our case for all training-test partitions the 
fuzzy rulebased classifiers outperform the MLC and at 



Training 


No. of 


Error Rate in 


Set 


rules 


Whole Image 


Satimagel 


1. 


30 


12.3% 


2. 


25 


13.37% 


3. 


25 


11.6% 


4. 


27 


11.03% 


Satimagc2 


1. 


14 


12.7% 


2. 


14 


12.65% 


3. 


12 


12.4% 


4. 


11 


12.51% 



Table 2: Performances of the evidence theoretic fuzzy 
classifiers for different training sets 

par with the results reported for neural networks. 

Tables 2 summarizes the performances of the fuzzy 
rulebased classifiers using evidence theoretic approach. 
We used the same set of fuzzy rules as used previously, 
but the rule outputs are aggregated using the evidence 
theory. 

Comparison of Table 2 with Table 1 clearly shows 
that in every case there is a consistent improvement 
in the classification performance. In case of Satim- 
agel the improvements varied between 1.1% and 1.5% 
and the best performing classifier (for training set 4) 
achieves error rate as low as 11.03%. For Satimage2 
also the improvement varied between 1.4% and 1.7%. 
So the overall improvement for Satimagel over the ex- 
isting methods is more than 7%. For Satimagc2 also we 
achieved consistent improvements using training sets of 
smaller size. For applications like crop yield estimation 
even a small improvement will have a significant impact 
on the overall estimate. 

5. Conclusion 

We proposed two classifiers: one is fuzzy rule based 
and the other integrates outputs of fuzzy rules using 
theory of evidence. Fuzzy rules are extracted with the 
help SOFM. The system automatically decides on the 
number of rules. 

The fuzzy rule based classifier is of general nature 
and can be applied in any classification problem, while 
the evidence theoretic classifier exploits the spatial in- 
formation available for an image to make the classifi- 
cation decision. 

In the evidence theoretic framework we use the pixel 
under consideration and one of its neighbors to provide 
a body of evidence in support of different propositions 
regarding the class membership (to a particular class 
as well as a pair of classes) of the pixel. The BPAs for 
the propositions are calculated from the mutual con- 
fidences of the pixels in support of respective propo- 
sitions. Eight bodies of evidence is obtained for eight 
neighbors of the pixel. Now the evidences are combined 
to obtain a global body of evidence. Then pignistic 



probability for each class is computed and the pixel is 
assigned to the class with highest pignistic probabil- 
ity. The proposed system demonstrates a consistent 
improvement in performance. 
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